Probabilistic retrieval models : relationships, context-specific application, selection and implementation

نویسنده

  • Jun Wang
چکیده

Retrieval models are the core components of information retrieval systems, which guide the document and query representations, as well as the document ranking schemes. TF-IDF, binary independence retrieval (BIR) model and language modelling (LM) are three of the most influential contemporary models due to their stability and performance. The BIR model and LM have probabilistic theory as their basis, whereas TF-IDF is viewed as a heuristic model, whose theoretical justification always fascinates researchers. This thesis firstly investigates the parallel derivation of BIR model, LM and Poisson model, wrt event spaces, relevance assumptions and ranking rationales. It establishes a bridge between the BIR model and LM, and derives TF-IDF from the probabilistic framework. Then, the thesis presents the probabilistic logical modelling of the retrieval models. Various ways of how to estimate and aggregate probability, and alternative implementation to nonprobabilistic operator are demonstrated. Typical models have been implemented. The next contribution concerns the usage of of context-specific frequencies, i.e., the frequencies counted based on assorted element types or within different text scopes. The hypothesis is that they can help to rank the elements in structured document retrieval. The thesis applies context-specific frequencies on term weighting schemes in these models, and the outcome is a generalised retrieval model with regard to both element and document ranking. The retrieval models behave differently on the same query set: for some queries, one model performs better, for other queries, another model is superior. Therefore, one idea to improve the overall performance of a retrieval system is to choose for each query the model that is likely to perform the best. This thesis proposes and empirically explores the model selection method according to the correlation of query feature and query performance, which contributes to the methodology of dynamically choosing a model. In summary, this thesis contributes a study of probabilistic models and their relationships, the probabilistic logical modelling of retrieval models, the usage and effect of context-specific frequencies in models, and the selection of retrieval models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Probabilistic Models for Designing Motion and Sound Relationships

We present a set of probabilistic models that support the design of movement and sound relationships in interactive sonic systems. We focus on a mapping–by–demonstration approach in which the relationships between motion and sound are defined by a machine learning model that learns from a set of user examples. We describe four probabilistic models with complementary characteristics in terms of ...

متن کامل

Complementarity , biodiversity viability analysis , and policy - based algorithms for conservation

Biodiversity conservation “area-selection” strategies include not only trade-offs among society’s needs in land-use allocation, but also allocation of economic instruments such as incentives, levies, and biodiversity credits. For these applications, the key property of an area is its “complementarity”—the context-dependent, marginal gain in biodiversity provided by the area. Given that there ha...

متن کامل

تحلیل مدل‌های ارزیابی و موفقیت کتابخانه‌های دیجیتالی

: This paper discusses the dominant models of a broader field of success and evaluation of digital libraries and seeks the relationship between the models and their origins. The main objectives of the paper are recognizing digital libraries’ key success and evaluation models, holistic representation of the links between DLs’ evaluation and success models and frameworks in a unique window, and i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011